{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Explaining XGBoost predictions on the Titanic dataset\n",
    "\n",
    "This tutorial will show you how to analyze predictions of an XGBoost classifier (regression for XGBoost and most scikit-learn tree ensembles are also supported by eli5).\n",
    "We will use [Titanic dataset](https://www.kaggle.com/c/titanic/data), which is small and has not too many\n",
    "features, but is still interesting enough.\n",
    "\n",
    "We are using [XGBoost](https://xgboost.readthedocs.io/en/latest/) 0.6a2 and data downloaded from https://www.kaggle.com/c/titanic/data (it is also bundled in the eli5 repo: https://github.com/TeamHG-Memex/eli5/blob/master/notebooks/titanic-train.csv).\n",
    "\n",
    "\n",
    "## 1. Training data\n",
    "\n",
    "Let's start by loading the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[OrderedDict([('PassengerId', '1'),\n",
       "              ('Survived', '0'),\n",
       "              ('Pclass', '3'),\n",
       "              ('Name', 'Braund, Mr. Owen Harris'),\n",
       "              ('Sex', 'male'),\n",
       "              ('Age', '22'),\n",
       "              ('SibSp', '1'),\n",
       "              ('Parch', '0'),\n",
       "              ('Ticket', 'A/5 21171'),\n",
       "              ('Fare', '7.25'),\n",
       "              ('Cabin', ''),\n",
       "              ('Embarked', 'S')])]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import csv\n",
    "import numpy as np\n",
    "\n",
    "with open('titanic-train.csv', 'rt') as f:\n",
    "    data = list(csv.DictReader(f))\n",
    "data[:1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Variable descriptions:\n",
    "\n",
    "- **Age:** Age\n",
    "- **Cabin:** Cabin\n",
    "- **Embarked:** Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)\n",
    "- **Fare:** Passenger Fare\n",
    "- **Name:** Name\n",
    "- **Parch:** Number of Parents/Children Aboard\n",
    "- **Pclass:** Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)\n",
    "- **Sex:** Sex\n",
    "- **Sibsp:** Number of Siblings/Spouses Aboard\n",
    "- **Survived:** Survival (0 = No; 1 = Yes)\n",
    "- **Ticket:** Ticket Number\n",
    "\n",
    "Next, shuffle data and separate features from what we are trying to predict: survival."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "891 items total, 38.4% true\n"
     ]
    }
   ],
   "source": [
    "from sklearn.utils import shuffle\n",
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "_all_xs = [{k: v for k, v in row.items() if k != 'Survived'} for row in data]\n",
    "_all_ys = np.array([int(row['Survived']) for row in data])\n",
    "\n",
    "all_xs, all_ys = shuffle(_all_xs, _all_ys, random_state=0)\n",
    "train_xs, valid_xs, train_ys, valid_ys = train_test_split(\n",
    "    all_xs, all_ys, test_size=0.25, random_state=0)\n",
    "print('{} items total, {:.1%} true'.format(len(all_xs), np.mean(all_ys)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We do just minimal preprocessing: convert obviously contiuous *Age* and *Fare* variables to floats,\n",
    "and *SibSp*, *Parch* to integers. Missing *Age* values are removed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "for x in all_xs:\n",
    "    if x['Age']:\n",
    "        x['Age'] = float(x['Age'])\n",
    "    else:\n",
    "        x.pop('Age')\n",
    "    x['Fare'] = float(x['Fare'])\n",
    "    x['SibSp'] = int(x['SibSp'])\n",
    "    x['Parch'] = int(x['Parch'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Simple XGBoost classifier\n",
    "\n",
    "Let's first build a very simple classifier with [xbgoost.XGBClassifier](http://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier)\n",
    "and [sklearn.feature_extraction.DictVectorizer](http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html), and check its accuracy with 10-fold cross-validation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy: 0.823 ± 0.071\n"
     ]
    }
   ],
   "source": [
    "import warnings\n",
    "# xgboost <= 0.6a2 shows a warning when used with scikit-learn 0.18+\n",
    "warnings.filterwarnings('ignore', category=DeprecationWarning) \n",
    "from xgboost import XGBClassifier\n",
    "from sklearn.feature_extraction import DictVectorizer\n",
    "from sklearn.pipeline import make_pipeline\n",
    "from sklearn.model_selection import cross_val_score\n",
    "\n",
    "class CSCTransformer:\n",
    "    def transform(self, xs):\n",
    "        # work around https://github.com/dmlc/xgboost/issues/1238#issuecomment-243872543\n",
    "        return xs.tocsc()\n",
    "    def fit(self, *args):\n",
    "        return self\n",
    "    \n",
    "clf = XGBClassifier()\n",
    "vec = DictVectorizer()\n",
    "pipeline = make_pipeline(vec, CSCTransformer(), clf)\n",
    "\n",
    "def evaluate(_clf):\n",
    "    scores = cross_val_score(_clf, all_xs, all_ys, scoring='accuracy', cv=10)\n",
    "    print('Accuracy: {:.3f} ± {:.3f}'.format(np.mean(scores), 2 * np.std(scores)))\n",
    "    _clf.fit(train_xs, train_ys)  # so that parts of the original pipeline are fitted\n",
    "     \n",
    "evaluate(pipeline)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is one tricky bit about the code above: XGBClassifier in xgboost 0.6a2 has some\n",
    "[issues](https://github.com/dmlc/xgboost/issues/1238) with sparse data.\n",
    "One way to solve them is to convert a sparse matrix to CSC format, so we add a ``CSCTransformer`` to the pipelne.\n",
    "One may be templed to just pass ``dense=True`` to ``DictVectorizer``: after all, in this case the matrixes are small. But this is not a great solution, because we will loose the ability to distinguish features that are missing and features that have zero value.\n",
    "\n",
    "\n",
    "## 3. Explaining weights\n",
    "\n",
    "In order to calculate a prediction, XGBoost sums predictions of all its trees.\n",
    "The number of trees is controlled by ``n_estimators`` argument and is 100 by default.\n",
    "Each tree is not a great predictor on it's own, but by summing across all trees,\n",
    "XGBoost is able to provide a robust estimate in many cases. Here is one of the trees:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0:[Sex=female<-9.53674e-07] yes=1,no=2,missing=1\n",
      "\t1:[Age<13] yes=3,no=4,missing=4\n",
      "\t\t3:[SibSp<2] yes=7,no=8,missing=7\n",
      "\t\t\t7:leaf=0.145455\n",
      "\t\t\t8:leaf=-0.125\n",
      "\t\t4:[Fare<26.2687] yes=9,no=10,missing=9\n",
      "\t\t\t9:leaf=-0.151515\n",
      "\t\t\t10:leaf=-0.0727273\n",
      "\t2:[Pclass=3<-9.53674e-07] yes=5,no=6,missing=5\n",
      "\t\t5:[Fare<12.175] yes=11,no=12,missing=12\n",
      "\t\t\t11:leaf=0.05\n",
      "\t\t\t12:leaf=0.175194\n",
      "\t\t6:[Fare<24.8083] yes=13,no=14,missing=14\n",
      "\t\t\t13:leaf=0.0365591\n",
      "\t\t\t14:leaf=-0.152\n",
      "\n"
     ]
    }
   ],
   "source": [
    "booster = clf.booster()\n",
    "original_feature_names = booster.feature_names\n",
    "booster.feature_names = vec.get_feature_names()\n",
    "print(booster.get_dump()[0])\n",
    "# recover original feature names\n",
    "booster.feature_names = original_feature_names"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that this tree checks *Sex*, *Age*, *Pclass*, *Fare* and *SibSp* features. ``leaf`` gives the decision of a single tree, and they are summed over all trees in the ensemble.\n",
    "\n",
    "Let's check feature importances with eli5.show_weights:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    table.eli5-weights tr:hover {\n",
       "        filter: brightness(85%);\n",
       "    }\n",
       "</style>\n",
       "\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "        <table class=\"eli5-weights eli5-feature-importances\" style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto;\">\n",
       "    <thead>\n",
       "    <tr style=\"border: none;\">\n",
       "        <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">Weight</th>\n",
       "        <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "    </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.4278\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Sex=female\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 88.46%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.1949\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Pclass=3\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 94.57%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0665\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Embarked=S\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 95.49%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0510\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Pclass=2\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 96.06%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0420\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                SibSp\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 96.08%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0417\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Cabin=\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 96.29%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0385\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Embarked=C\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 96.47%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0358\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Ticket=1601\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 96.66%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0331\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Age\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 96.72%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0323\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Fare\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.49%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0220\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Pclass=1\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 98.15%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0143\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Parch\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(0, 100.00%, 100.00%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name=Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(0, 100.00%, 100.00%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name=Roebling, Mr. Washington Augustus II\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(0, 100.00%, 100.00%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name=Rosblom, Mr. Viktor Richard\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(0, 100.00%, 100.00%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name=Ross, Mr. John Hugo\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(0, 100.00%, 100.00%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name=Rush, Mr. Alfred George John\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(0, 100.00%, 100.00%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name=Rouse, Mr. Richard Henry\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(0, 100.00%, 100.00%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name=Ryerson, Miss. Emily Borie\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(0, 100.00%, 100.00%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name=Ryerson, Miss. Susan Parker &quot;Suzette&quot;\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "    \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 100.00%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 1972 more &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "    \n",
       "    </tbody>\n",
       "</table>\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from eli5 import show_weights\n",
    "show_weights(clf, vec=vec)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are several different ways to calculate feature importances. By default, \"gain\" is used, that is the average gain of the feature when it is used in trees. Other types are \"weight\" - the number of times a feature is used to split the data, and \"cover\" - the average coverage of the feature. You can pass it with ``importance_type`` argument.\n",
    "\n",
    "Now we know that two most important features are *Sex=female* and *Pclass=3*, but we still don't know how XGBoost decides what prediction to make based on their values.\n",
    "\n",
    "\n",
    "## 4. Explaining predictions\n",
    "\n",
    "To get a better idea of how our classifier works, let's examine individual predictions with eli5.show_prediction:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    table.eli5-weights tr:hover {\n",
       "        filter: brightness(85%);\n",
       "    }\n",
       "</style>\n",
       "\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "    \n",
       "\n",
       "        \n",
       "            \n",
       "                \n",
       "                \n",
       "    \n",
       "        <p style=\"margin-bottom: 0.5em; margin-top: 0em\">\n",
       "            <b>\n",
       "    \n",
       "        y=1\n",
       "    \n",
       "</b>\n",
       "\n",
       "    \n",
       "    (probability <b>0.566</b>, score <b>0.264</b>)\n",
       "\n",
       "top features\n",
       "        </p>\n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; margin-bottom: 2em;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature contribution already accounts for the feature value (for linear models, contribution = weight * feature value), and the sum of feature contributions is equal to the score or, for some classifiers, to the probability. Feature values are shown if &quot;show_feature_values&quot; is True.\">\n",
       "                    Contribution<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "                <th style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">Value</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.673\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Sex=female\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.67%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.479\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Embarked=S\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            Missing\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 97.83%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.070\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Fare\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            7.879\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 99.73%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.004\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Cabin=\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 99.63%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.006\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Parch\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            0.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 99.50%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.009\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Pclass=2\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            Missing\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 99.47%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.009\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Ticket=1601\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            Missing\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 99.38%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.012\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Embarked=C\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            Missing\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 97.81%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.071\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        SibSp\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            0.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 97.77%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.073\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Pclass=1\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            Missing\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 96.36%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.147\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Age\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            19.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 91.08%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.528\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        &lt;BIAS&gt;\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 85.09%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.100\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Pclass=3\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "            \n",
       "        \n",
       "\n",
       "        \n",
       "\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from eli5 import show_prediction\n",
    "show_prediction(clf, valid_xs[1], vec=vec, show_feature_values=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Weight means how much each feature contributed to the final prediction across all trees.\n",
    "The idea for weight calculation is described in http://blog.datadive.net/interpreting-random-forests/;\n",
    "eli5 provides an independent implementation of this algorithm for XGBoost and most scikit-learn tree ensembles.\n",
    "\n",
    "Here we see that classifier thinks it's good to be a female, but bad to travel third class.\n",
    "Some features have \"Missing\" as value\n",
    "(we are passing ``show_feature_values=True`` to view the values):\n",
    "that means that the feature was missing,\n",
    "so in this case it's good to not have embarked in Southampton. This is where our decision to go with sparse matrices comes handy - we still see that *Parch* is zero, not missing.\n",
    "\n",
    "It's possible to show only features that are present using ``feature_filter`` argument: it's\n",
    "a function that accepts feature name and value, and returns True value for features that should be shown:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    table.eli5-weights tr:hover {\n",
       "        filter: brightness(85%);\n",
       "    }\n",
       "</style>\n",
       "\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "    \n",
       "\n",
       "        \n",
       "            \n",
       "                \n",
       "                \n",
       "    \n",
       "        <p style=\"margin-bottom: 0.5em; margin-top: 0em\">\n",
       "            <b>\n",
       "    \n",
       "        y=1\n",
       "    \n",
       "</b>\n",
       "\n",
       "    \n",
       "    (probability <b>0.566</b>, score <b>0.264</b>)\n",
       "\n",
       "top features\n",
       "        </p>\n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; margin-bottom: 2em;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature contribution already accounts for the feature value (for linear models, contribution = weight * feature value), and the sum of feature contributions is equal to the score or, for some classifiers, to the probability. Feature values are shown if &quot;show_feature_values&quot; is True.\">\n",
       "                    Contribution<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "                <th style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">Value</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.673\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Sex=female\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 97.83%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.070\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Fare\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            7.879\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 99.73%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.004\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Cabin=\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 99.63%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.006\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Parch\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            0.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 97.81%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.071\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        SibSp\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            0.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 96.36%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.147\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Age\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            19.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 91.08%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.528\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        &lt;BIAS&gt;\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 85.09%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -1.100\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Pclass=3\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "            \n",
       "        \n",
       "\n",
       "        \n",
       "\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "no_missing = lambda feature_name, feature_value: not np.isnan(feature_value)\n",
    "show_prediction(clf, valid_xs[1], vec=vec, show_feature_values=True, feature_filter=no_missing)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Adding text features\n",
    "\n",
    "Right now we treat *Name* field as categorical, like other text features.\n",
    "But in this dataset each name is unique, so XGBoost does not use this feature at all, because it's\n",
    "such a poor discriminator: it's absent from the weights table in section 3.\n",
    "\n",
    "But *Name* still might contain some useful information. We don't want to guess how to best pre-process it\n",
    "and what features to extract, so let's use the most general character ngram vectorizer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Accuracy: 0.839 ± 0.081\n"
     ]
    }
   ],
   "source": [
    "from sklearn.pipeline import FeatureUnion\n",
    "from sklearn.feature_extraction.text import CountVectorizer\n",
    "\n",
    "vec2 = FeatureUnion([\n",
    "    ('Name', CountVectorizer(\n",
    "        analyzer='char_wb',\n",
    "        ngram_range=(3, 4),\n",
    "        preprocessor=lambda x: x['Name'],\n",
    "        max_features=100,\n",
    "    )),\n",
    "    ('All', DictVectorizer()),\n",
    "])\n",
    "clf2 = XGBClassifier()\n",
    "pipeline2 = make_pipeline(vec2, CSCTransformer(), clf2)\n",
    "evaluate(pipeline2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case the pipeline is more complex, we slightly improved our result,\n",
    "but the improvement is not significant. Let's look at feature importances:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    table.eli5-weights tr:hover {\n",
       "        filter: brightness(85%);\n",
       "    }\n",
       "</style>\n",
       "\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "        <table class=\"eli5-weights eli5-feature-importances\" style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto;\">\n",
       "    <thead>\n",
       "    <tr style=\"border: none;\">\n",
       "        <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">Weight</th>\n",
       "        <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "    </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.3138\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name__<span style=\"background-color: hsl(120, 80%, 70%); margin: 0 0.1em 0 0.1em\" title=\"A space symbol\">&emsp;</span>Mr.\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 92.18%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0821\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                All__Pclass=3\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 94.92%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0443\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name__sso\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 96.18%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0294\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                All__Sex=female\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 96.97%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0212\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name__lia\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.04%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0205\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                All__Fare\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.06%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0203\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                All__Ticket=1601\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.12%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0197\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                All__Embarked=S\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.23%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0187\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name__<span style=\"background-color: hsl(120, 80%, 70%); margin: 0 0.1em 0 0.1em\" title=\"A space symbol\">&emsp;</span>Ma\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.33%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0177\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                All__Cabin=\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.38%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0172\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name__<span style=\"background-color: hsl(120, 80%, 70%); margin: 0 0.1em 0 0.1em\" title=\"A space symbol\">&emsp;</span>Mar\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.42%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0168\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name__s,<span style=\"background-color: hsl(120, 80%, 70%); margin: 0 0 0 0.1em\" title=\"A space symbol\">&emsp;</span>\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.51%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0160\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name__<span style=\"background-color: hsl(120, 80%, 70%); margin: 0 0.1em 0 0.1em\" title=\"A space symbol\">&emsp;</span>Mr\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.54%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0157\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name__son\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.76%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0138\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name__ne<span style=\"background-color: hsl(120, 80%, 70%); margin: 0 0 0 0.1em\" title=\"A space symbol\">&emsp;</span>\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.76%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0137\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name__ber\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.77%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0136\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                All__SibSp\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.78%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0136\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                Name__e,<span style=\"background-color: hsl(120, 80%, 70%); margin: 0 0 0 0.1em\" title=\"A space symbol\">&emsp;</span>\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.80%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0134\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                All__Pclass=1\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "        <tr style=\"background-color: hsl(120, 100.00%, 97.91%); border: none;\">\n",
       "            <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "                0.0125\n",
       "                \n",
       "            </td>\n",
       "            <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "                All__Embarked=C\n",
       "            </td>\n",
       "        </tr>\n",
       "    \n",
       "    \n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 97.91%); border: none;\">\n",
       "                <td colspan=\"2\" style=\"padding: 0 0.5em 0 0.5em; text-align: center; border: none; white-space: nowrap;\">\n",
       "                    <i>&hellip; 2072 more &hellip;</i>\n",
       "                </td>\n",
       "            </tr>\n",
       "        \n",
       "    \n",
       "    </tbody>\n",
       "</table>\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "show_weights(clf2, vec=vec2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We see that now there is a lot of features that come from the *Name* field\n",
    "(in fact, a classifier based on *Name* alone gives about 0.79 accuracy).\n",
    "Name features listed in this way are not very informative, they make more sense\n",
    "when we check out predictions.\n",
    "We hide missing features here because there is a lot of missing features in text,\n",
    "but they are not very interesting:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    table.eli5-weights tr:hover {\n",
       "        filter: brightness(85%);\n",
       "    }\n",
       "</style>\n",
       "\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "    \n",
       "\n",
       "        \n",
       "\n",
       "        \n",
       "    \n",
       "        \n",
       "        \n",
       "    \n",
       "        <p style=\"margin-bottom: 0.5em; margin-top: 0em\">\n",
       "            <b>\n",
       "    \n",
       "        y=1\n",
       "    \n",
       "</b>\n",
       "\n",
       "    \n",
       "    (probability <b>0.771</b>, score <b>1.215</b>)\n",
       "\n",
       "top features\n",
       "        </p>\n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; margin-bottom: 2em;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature contribution already accounts for the feature value (for linear models, contribution = weight * feature value), and the sum of feature contributions is equal to the score or, for some classifiers, to the probability. Feature values are shown if &quot;show_feature_values&quot; is True.\">\n",
       "                    Contribution<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "                <th style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">Value</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.995\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Name: Highlighted in text (sum)\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            \n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.43%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.347\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Fare\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            17.800\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.69%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.236\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Sex=female\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 95.73%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.109\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Age\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            18.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 98.32%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.029\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Cabin=\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 96.91%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.069\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Parch\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            0.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 94.67%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.150\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Embarked=S\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 93.15%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.215\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__SibSp\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 86.98%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.539\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        &lt;BIAS&gt;\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 80.89%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.932\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Pclass=3\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n",
       "    <p style=\"margin-bottom: 2.5em; margin-top:-0.5em;\">\n",
       "        <b>Name:</b> <span style=\"opacity: 0.80\">Arnold-Franchi,</span><span style=\"background-color: hsl(120, 100.00%, 83.64%); opacity: 0.86\" title=\"0.067\"> Mrs</span><span style=\"opacity: 0.80\">. Josef (Josefi</span><span style=\"background-color: hsl(120, 100.00%, 60.00%); opacity: 1.00\" title=\"0.242\">ne </span><span style=\"opacity: 0.80\">Franchi)</span>\n",
       "    </p>\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    table.eli5-weights tr:hover {\n",
       "        filter: brightness(85%);\n",
       "    }\n",
       "</style>\n",
       "\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "    \n",
       "\n",
       "        \n",
       "\n",
       "        \n",
       "    \n",
       "        \n",
       "        \n",
       "    \n",
       "        <p style=\"margin-bottom: 0.5em; margin-top: 0em\">\n",
       "            <b>\n",
       "    \n",
       "        y=0\n",
       "    \n",
       "</b>\n",
       "\n",
       "    \n",
       "    (probability <b>0.905</b>, score <b>-2.248</b>)\n",
       "\n",
       "top features\n",
       "        </p>\n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; margin-bottom: 2em;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature contribution already accounts for the feature value (for linear models, contribution = weight * feature value), and the sum of feature contributions is equal to the score or, for some classifiers, to the probability. Feature values are shown if &quot;show_feature_values&quot; is True.\">\n",
       "                    Contribution<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "                <th style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">Value</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.948\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Name: Highlighted in text (sum)\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            \n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 86.54%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.539\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        &lt;BIAS&gt;\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 89.33%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.387\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Parch\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            0.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.80%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.221\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Age\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            45.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 96.73%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.071\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Cabin=\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 97.94%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.037\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__SibSp\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            0.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 96.86%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.067\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Pclass=1\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 87.37%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.492\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Fare\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            26.550\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n",
       "    <p style=\"margin-bottom: 2.5em; margin-top:-0.5em;\">\n",
       "        <b>Name:</b> <span style=\"opacity: 0.80\">Romain</span><span style=\"background-color: hsl(120, 100.00%, 86.78%); opacity: 0.84\" title=\"0.056\">e,</span><span style=\"background-color: hsl(120, 100.00%, 60.00%); opacity: 1.00\" title=\"0.270\"> </span><span style=\"background-color: hsl(120, 100.00%, 65.95%); opacity: 0.96\" title=\"0.215\">Mr</span><span style=\"background-color: hsl(120, 100.00%, 65.63%); opacity: 0.96\" title=\"0.218\">.</span><span style=\"opacity: 0.80\"> Ch</span><span style=\"background-color: hsl(0, 100.00%, 87.44%); opacity: 0.84\" title=\"-0.052\">arl</span><span style=\"background-color: hsl(120, 100.00%, 92.42%); opacity: 0.82\" title=\"0.025\">es </span><span style=\"opacity: 0.80\">Hallace (&quot;Mr C Rolmane&quot;)</span>\n",
       "    </p>\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    table.eli5-weights tr:hover {\n",
       "        filter: brightness(85%);\n",
       "    }\n",
       "</style>\n",
       "\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "    \n",
       "\n",
       "        \n",
       "\n",
       "        \n",
       "    \n",
       "        \n",
       "        \n",
       "    \n",
       "        <p style=\"margin-bottom: 0.5em; margin-top: 0em\">\n",
       "            <b>\n",
       "    \n",
       "        y=0\n",
       "    \n",
       "</b>\n",
       "\n",
       "    \n",
       "    (probability <b>0.941</b>, score <b>-2.762</b>)\n",
       "\n",
       "top features\n",
       "        </p>\n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; margin-bottom: 2em;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature contribution already accounts for the feature value (for linear models, contribution = weight * feature value), and the sum of feature contributions is equal to the score or, for some classifiers, to the probability. Feature values are shown if &quot;show_feature_values&quot; is True.\">\n",
       "                    Contribution<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "                <th style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">Value</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 80.00%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +1.946\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__SibSp\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            8.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 87.97%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.942\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Fare\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            69.550\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 90.44%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.678\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Pclass=3\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 91.86%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.539\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        &lt;BIAS&gt;\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 96.52%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.160\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Parch\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            2.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 97.97%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.074\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Embarked=S\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 98.95%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.029\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Cabin=\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 90.53%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.669\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Name: Highlighted in text (sum)\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            \n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n",
       "    <p style=\"margin-bottom: 2.5em; margin-top:-0.5em;\">\n",
       "        <b>Name:</b> <span style=\"opacity: 0.80\">Sag</span><span style=\"background-color: hsl(120, 100.00%, 79.23%); opacity: 0.88\" title=\"0.112\">e,</span><span style=\"background-color: hsl(0, 100.00%, 71.77%); opacity: 0.92\" title=\"-0.174\"> </span><span style=\"background-color: hsl(0, 100.00%, 60.00%); opacity: 1.00\" title=\"-0.286\">Ma</span><span style=\"background-color: hsl(0, 100.00%, 74.79%); opacity: 0.90\" title=\"-0.148\">s</span><span style=\"opacity: 0.80\">ter. Thomas Henry</span>\n",
       "    </p>\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    table.eli5-weights tr:hover {\n",
       "        filter: brightness(85%);\n",
       "    }\n",
       "</style>\n",
       "\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "    \n",
       "\n",
       "        \n",
       "\n",
       "        \n",
       "    \n",
       "        \n",
       "        \n",
       "    \n",
       "        <p style=\"margin-bottom: 0.5em; margin-top: 0em\">\n",
       "            <b>\n",
       "    \n",
       "        y=1\n",
       "    \n",
       "</b>\n",
       "\n",
       "    \n",
       "    (probability <b>0.679</b>, score <b>0.750</b>)\n",
       "\n",
       "top features\n",
       "        </p>\n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; margin-bottom: 2em;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature contribution already accounts for the feature value (for linear models, contribution = weight * feature value), and the sum of feature contributions is equal to the score or, for some classifiers, to the probability. Feature values are shown if &quot;show_feature_values&quot; is True.\">\n",
       "                    Contribution<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "                <th style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">Value</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.35%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.236\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Sex=female\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.59%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.226\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Fare\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            7.879\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 94.67%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.141\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Name: Highlighted in text (sum)\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            \n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 99.16%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.010\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__SibSp\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            0.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 98.24%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.029\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Cabin=\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 97.75%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.041\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Parch\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            0.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 86.37%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.539\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        &lt;BIAS&gt;\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 80.00%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.932\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Pclass=3\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n",
       "    <p style=\"margin-bottom: 2.5em; margin-top:-0.5em;\">\n",
       "        <b>Name:</b> <span style=\"opacity: 0.80\">Mockl</span><span style=\"background-color: hsl(120, 100.00%, 70.66%); opacity: 0.93\" title=\"0.059\">e</span><span style=\"background-color: hsl(120, 100.00%, 60.00%); opacity: 1.00\" title=\"0.091\">r,</span><span style=\"background-color: hsl(120, 100.00%, 80.52%); opacity: 0.87\" title=\"0.033\"> </span><span style=\"opacity: 0.80\">Miss. Helen</span><span style=\"background-color: hsl(0, 100.00%, 87.51%); opacity: 0.84\" title=\"-0.017\"> </span><span style=\"background-color: hsl(0, 100.00%, 75.91%); opacity: 0.90\" title=\"-0.044\">Ma</span><span style=\"background-color: hsl(0, 100.00%, 82.98%); opacity: 0.86\" title=\"-0.027\">r</span><span style=\"opacity: 0.80\">y &quot;Ellie&quot;</span>\n",
       "    </p>\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "\n",
       "    <style>\n",
       "    table.eli5-weights tr:hover {\n",
       "        filter: brightness(85%);\n",
       "    }\n",
       "</style>\n",
       "\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "        \n",
       "\n",
       "    \n",
       "\n",
       "        \n",
       "\n",
       "        \n",
       "    \n",
       "        \n",
       "        \n",
       "    \n",
       "        <p style=\"margin-bottom: 0.5em; margin-top: 0em\">\n",
       "            <b>\n",
       "    \n",
       "        y=1\n",
       "    \n",
       "</b>\n",
       "\n",
       "    \n",
       "    (probability <b>0.660</b>, score <b>0.663</b>)\n",
       "\n",
       "top features\n",
       "        </p>\n",
       "    \n",
       "    <table class=\"eli5-weights\"\n",
       "           style=\"border-collapse: collapse; border: none; margin-top: 0em; table-layout: auto; margin-bottom: 2em;\">\n",
       "        <thead>\n",
       "        <tr style=\"border: none;\">\n",
       "            \n",
       "                <th style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\" title=\"Feature contribution already accounts for the feature value (for linear models, contribution = weight * feature value), and the sum of feature contributions is equal to the score or, for some classifiers, to the probability. Feature values are shown if &quot;show_feature_values&quot; is True.\">\n",
       "                    Contribution<sup>?</sup>\n",
       "                </th>\n",
       "            \n",
       "            <th style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">Feature</th>\n",
       "            \n",
       "                <th style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">Value</th>\n",
       "            \n",
       "        </tr>\n",
       "        </thead>\n",
       "        <tbody>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 92.35%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.236\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Sex=female\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 94.16%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.161\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Fare\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            23.250\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 94.21%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.158\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        Name: Highlighted in text (sum)\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            \n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 94.39%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.152\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Embarked=Q\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(120, 100.00%, 99.16%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        +0.010\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__SibSp\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            2.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "        \n",
       "\n",
       "        \n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 98.24%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.029\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Cabin=\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 96.77%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.069\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Parch\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            0.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 86.37%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.539\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        &lt;BIAS&gt;\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "            <tr style=\"background-color: hsl(0, 100.00%, 80.00%); border: none;\">\n",
       "    <td style=\"padding: 0 1em 0 0.5em; text-align: right; border: none;\">\n",
       "        -0.932\n",
       "    </td>\n",
       "    <td style=\"padding: 0 0.5em 0 0.5em; text-align: left; border: none;\">\n",
       "        All__Pclass=3\n",
       "    </td>\n",
       "    \n",
       "        <td style=\"padding: 0 0.5em 0 1em; text-align: right; border: none;\">\n",
       "            1.000\n",
       "        </td>\n",
       "    \n",
       "</tr>\n",
       "        \n",
       "\n",
       "        </tbody>\n",
       "    </table>\n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n",
       "    <p style=\"margin-bottom: 2.5em; margin-top:-0.5em;\">\n",
       "        <b>Name:</b> <span style=\"opacity: 0.80\">McCo</span><span style=\"background-color: hsl(120, 100.00%, 60.00%); opacity: 1.00\" title=\"0.078\">y, </span><span style=\"opacity: 0.80\">Miss. Agn</span><span style=\"background-color: hsl(0, 100.00%, 81.90%); opacity: 0.86\" title=\"-0.025\">es</span>\n",
       "    </p>\n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "    \n",
       "\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.display import display\n",
    "\n",
    "for idx in [4, 5, 7, 37, 81]:\n",
    "    display(show_prediction(clf2, valid_xs[idx], vec=vec2,\n",
    "                            show_feature_values=True, feature_filter=no_missing))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Text features from the *Name* field are highlighted directly in text, and the sum of weights is shown in the weights table as \"Name: Highlighted in text (sum)\".\n",
    "\n",
    "Looks like name classifier tried to infer both gender and status from the title: \"Mr.\" is bad\n",
    "because women are saved first, and it's better to be \"Mrs.\" (married) than \"Miss.\".\n",
    "Also name classifier is trying to pick some parts of names and surnames, especially endings,\n",
    "perhaps as a proxy for social status.\n",
    "It's especially bad to be \"Mary\" if you are from the third class."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
